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Abstract 

We present a numerical method to compute an optimal maintenance date for the test case 
of the heated hold-up tank. The system consists of a tank containing a fluid whose level is 
controlled by three components: two inlet pumps and one outlet valve. A thermal power 
source heats up the fluid. The failure rates of the components depends on the temperature, 
the position of the three components monitors the liquid level in the tank and the liquid level 
determines the temperature. Therefore, this system can be modeled by a hybrid process where 
the discrete (components) and continuous (level, temperature) parts interact in a closed loop. 
We model the system by a piecewise deterministic Markov process, propose and implement a 
numerical method to compute the optimal maintenance date to repair the components before 
the total failure of the system. 



1 Introduction 

A complex system is inherently sensitive to failures of its components. One must therefore de- 
termine maintenance policies in order to maintain an acceptable operating condition. Optimizing 
the maintenance is a very important problem in the analysis of complex systems. It determines 
when it is best that maintenance tasks should be performed on the system in order to optimize 
a cost function: either maximize a performance function or conversely minimize a loss function. 
Moreover, this optimization must take into account the random nature of failures and random 
evolution and dynamics of the system. 

The example considered here is the maintenance of the heated hold-up tank, a well know test 
case for dynamic reliability, see e.g. Devooght(1997)| Marseguerra and Zio(1995) , Marseguerra and Zio(1996)| 



Zhang et al. (2008) Zhang, Dufour, Dutuit, and Gonzalez . The system consists of a tank contain- 



ing a fluid whose level is controlled by three components: two inlet pumps and one outlet valve. 
A thermal power source heats up the fluid. The failure rate of the components depends on the 
temperature, the position of the three components monitors the liquid level in the tank, and in 
turn, the liquid level determines the temperature. The main characteristic of this system is that it 
can be modeled by a stochastic hybrid process, where the discrete and continuous parts interact in 
a closed loop. As a consequence, simulating this process and computing related reliability indices 
has been a challenge for the dynamic reliability community. To our best knowledge, optimization 
of maintenance policies for the heated hold-up tank has not been addressed yet in the literature. 

The only maintenance operation considered here is the complete replacement of all the failed 
components and the system restarts in its initial equilibrium state. Partial repairs are not allowed. 
Mathematically, this problem of preventive maintenance corresponds to a stochastic optimal stop- 
ping problem as explained by example in the book of Aven and Jensen Aven and Jensen(1999)| . 



It is a difficult problem because of the closed loop interactions between the state of the compo- 
nents and the liquid level and temperature. A classical approach consists in using condition-based 
maintenance (CBM) to act on the system based on its current state and before its failure. One 
can for example calculate the remaining useful life (RUL) of the system and the preventive re- 
placement is carried out when the deterioration level exceeds a certain threshold or enters in a 
certain state van Noortwijk J. M. (2009), Grail et al. (2002) Grail, Berenguer, and Dieulle . Our ap- 
proach also takes into account the current state of the process, but our decision rule is not based 
on damage accumulation nor does it correspond to hitting some threshold. Instead, it involves a 
performance function that reflects that the longer the system is in a functioning state the better. 

The dynamics of the heated hold-up tank can be modeled by a piecewise deterministic Markov 
process (PDMP), see Zhang et al.(2008)Zh ang, Dufour, Dutuit, and Gonzalez] . Therefore, our 



maintenance problem boils down to an optimal stopping problem for PDMP's. PDMP's are a class 
of stochastic hybrid processes that has been introduced by Davis |Davis(1993)| in the 80's. These 
processes have two components: a Euclidean component that represents the physical system (e.g. 
temperature, pressure, . . . ) and a discrete component that describes its regime of operation and/or 
its environment. Starting from a state x and mode m at the initial time, the process follows a 
deterministic trajectory given by the laws of physics until a jump time that can be either random 
(e.g. it corresponds to a component failure or a change of environment) or deterministic (when a 
magnitude reaches a certain physical threshold, for example the pressure reaches a critical value 
that triggers a valve) . The process restarts from a new state and a new mode of operation, and so 
on. This defines a Markov process. Such processes can naturally take into account the dynamic and 
uncertain aspects of the evolution of the system. A subclass of these processes has been introduced 
by Devooght Devooght(1997)| for an application in the nuclear field. The general model has been 
introduced in dynamic reliability by Dutuit and Dufour Dufour and Dutuit(2002)] . 

As illustrated above, it is crucial to have an efficient numerical tool to compute the opti- 
mal maintenance time in practical cases. To this aim, a general numerical approach was de- 



veloped in de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez . It was first applied to an 



example of maintenance of a metallic structure subject to corrosion, without closed loop interac- 
tions or deterministic jumps, and with a simple cost function that did not depend on time, see 



de Saporta et al.(2012)de Saporta, Dufour, Zhang, and Elegbede . The objective of the present 



paper is to further demonstrate the high practical power of the theoretical methodology described 



de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez , by applying it to the more challeng- 



ing heated hold-up tank problem. The cost function chosen here is also more complex as it takes 
into account both continuous components as well as the running time. More precisely, we propose 
to compute the optimal cost as well as a quasi-optimal stopping rule, which is the date when 
the maintenance should be performed. As a by-product of our procedure, the distribution of the 
optimal maintenance dates is also obtained, as well as the distributions of the liquid level and 
temperature at the chosen maintenance date. 

The remainder of this paper is organized as follows. In section [2j the dynamics of the heated 
hold-up tank is presented with more details as well as the framework of PDMP's. In section [3] the 
formulation of the optimal stopping problem for PDMP's and its theoretical solution are briefly 
recalled and the four main steps of the algorithm are detailed. In section [4] the numerical results 
obtained on the example of tank are presented and discussed. Finally, in section [5] a conclusion 
and perspectives are presented. 



2 Model 

We are interested in the maintenance of a heated hold-up tank. The dynamics of the tank can be 
modeled by a piecewise deterministic Markov process (PDMP). We first describe with more details 
the dynamics of the tank, then we recall the definition and some basic properties of PDMP's. 

2.1 The heated hold-up tank 

The system is represented on Figure [I] It consists of a tank containing a fluid whose level is 
controlled by three components: two inlet pumps (units 1 and 2) and one outlet valve (unit 3). A 
thermal power source heats up the fluid. The variables of interest are the liquid level h, the liquid 
temperature 9 and the state of the three components and the controller. Each component has four 
states: ON, OFF, Stuck ON or Stuck OFF. The possible transitions between these four states are 
given in Figure [2j The initial state of the components is ON for units 1 and 3 and OFF for unit 2. 
The intensity of jumps A 1 for unit i depends on the temperature through the equation A 1 = a(9)P 
with 

. h exp (b c {6 - 20)) + b 2 exp (b d (9 - 20)) 
a W = hTTh • 

0\ + 2 

Function a(9) is represented on Figure [4] and the various parameters are given in Table [I] In 
addition, control laws are used to modify the state of the components to keep the liquid within two 
acceptable limits: 6 meters and 8 meters. If the liquid level drops under 6 m, the components I, 2, 3 
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Figure 1: The heated hold-up tank 




Figure 2: Transitions for unit i 



are put respectively in state ON, ON and OFF (provided they are not stuck). If the liquid level rises 
above 8 m, the components are put respectively in the state OFF, OFF and ON (provided they are 
not stuck). Unlike the class ical model presented in |Devooght(1997) Marseguerra and Zio(1995) 



Marseguerra and Zio(1996)| Zhang et al.(2008)Zhan g7T3ufour, Dutuit, and Gonzalez] , we also al 



low the control unit to fail. At each solicitation, the control may succeed with probability p = 0. 
independently from previous successes. Once it has failed, it will never succeed again. Therefore 
the control unit has two possible states: working 1 or failed 0. 

The evolution of the liquid level h depends on the position of the three components through 
the differential equation 

dh , . _ 

— = (y l + V 2 + y 3 )G, (1) 
where Vi = 1 if component i is ON or Stuck ON, = otherwise, and G is the common flow of 



Figure 3: a(9) as a function of 
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Table 1: Parameters for the tank dynamics 



the three components and is given in Table [[} The initial level is ho = 7 m. The temperature 9 
depends on the liquid level through the following differential equation 

f)9 

^ = {{v l + v 2 )G{6 m -9) + K)h-\ (2) 

where 9 in is the temperature of the incoming fluid, and K is a constant of the tank, the values 
of these parameters are given in Table [TJ The initial temperature is 9$ — 30.9261°C, so that the 
system is initially at an equilibrium state, and nothing happens until the failure of one of the 
components. The system stops as soon as one of the top events is reached: dry out (h < 4 m), 
overflow (h > 10 m) or hot temperature (9 > 100° C). 

2.2 Piecewise deterministic Markov processes 



As in Zhang et al.(2008)Zhang, Dufour, Dutuit, and Gonzalez , we model the tank by a Piecewise- 



deterministic Markov processes (PDMP). PDMP's are a general class of hybrid processes. They are 
defined as follows. Let M be the finite set of the possible modes of the system. In our tank example, 
the modes correspond to the possible positions of the inlet pumps, outlet valve and control unit. 
The components can be ON, OFF, stuck ON or Stuck OFF, the control unit can be in position 
or 1. Therefore, there are 128 possible modes for our system (but only 74 can actually be reached 
from the equilibrium starting point). 

For all modes m in M, let E m be an open subset in R d . In our case d = 3 as we need to take 
into account the running time as well as the liquid level and temperature, and E m is a subset of 
]4, 10[x [15, 100[x [0, +oo[ (depending on m). A PDMP is defined from three local characteristics 
(<f>, A, Q) where 

• the flow $ : M x R d x R — >• R d is continuous and for all s,t > 0, one has $(•, -,t + s) = 
<&($(•, •, s), t). It describes the deterministic trajectory of the process between jumps. In the 
tank example, it is given by the solution of Eq. ([T]) and |2]). For all (m, x) in M x E m , set 

t*(m, x) = ini{t > : $(m, x, t) E dE m }, 

the time to reach the boundary of the domain starting from x in mode m. For the tank, the 
boundary is one of the thresholds 4 m, 6 m, 8 m, 10 m, 100° C or 1000 hours for the running 
time. 

• the jump intensity A characterizes the frequency of jumps. For all (m,x) in M x E m , and 
t < t*(m, x), set 

A(m, x, t) — I A(<I>(m, x, s)) ds. 







For he tank the jump intensity given a mode m is the sum of the intensities A^ for the 
remaining possible jumps of the three units. 
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• the Markov kernel Q represents the transition measure of the process and allows to select 
the new location and mode after each jump. In our example, Q acts only on the mode 
components and leaves the liquid level h, temperature 9 and running time unchanged. It 
selects one of the remaining possible failures of the three components, or corresponds to an 
attempted control law. 

The trajectory X t — (m t ,Xt) of the process can then be defined iteratively. It starts at an 
initial point Xq = (fco,J/o) with fco E M and yo € Ek - For the tank, fco = (ON, OFF, ON, 1) and 
i/o = (7, 30.9261, 0). The first jump time T\ is determined by 

F (T e- A ( fc «^) if t<t*(k ,y ), 

It corresponds to the first failure time of one of the components as in our case t*(k , y n ) = +oo. On 
the interval [0,Ti), the process follows the deterministic trajectory m t — fco and a% = <fr(ko,yo,t). 
At the random time T\, a jump occurs. Note that a jump can be either a discontinuity in the 
Euclidean variable Xt or a change of mode. The process restarts at a new mode and/or position 
Xt x = (k\,yi), according to distribution Qfc ($(fco, j/o, Ti), •). An inter jump time T 2 — T\ is then 
selected in a similar way, and on the interval [Ti,T 2 ) the process follows the path m t — ki and 
Xt — $(fci, yi,t — T\). Thereby, iteratively, a PDMP is constructed, see Figure|4]for an illustration. 




Figure 4: An example of path for a PDMP until the second jump. The first jump is random. The 
second jump is deterministic because the process has reached the boundary of the domain. 



Let Zq = Xq, and for n > 1, Z„ — Xt n , location and mode of the process after each jump. 
Let Sq = 0, S\ = T\ and for n > 2, S n = T n — T„_i the inter-jump times between two consecutive 
jumps, then (Z n , S n ) is a Markov chain, which is the only source of randomness of the PDMP and 
contains all information on its random part. Indeed, if one knows the jump times and the positions 
after each jump, one can reconstruct the deterministic part of the trajectory between jumps. 
Namely, if one knows the time and nature of all the components failures, one can reconstruct the 
history of the liquid level and temperature though Eq. and It is a very important property 
of PDMP's that is at the basis of the numerical procedure. 

3 Optimization problem 



3.1 General framework 

The general mathematical problem of optimal stopping corresponding to this maintenance problem 
can be found in Gugcrli(1986)] de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez de Saporta et al.(2012)d< 
It is now briefly recalled. Let z = (fco,2/o) be the starting point of the PDMP (X t ). Let M / be 
the set of all stopping times r for the natural filtration of the PDMP (X t ) satisfying r < Tf 
that is to say that the intervention takes place before the time Tf = 1000 h. It has been 
shown in previous studies that by 1000 h, all the units are stuck and the events of interest have 
been observed, see e.g. |Devooght(1997) Marseguerra and Zio(1995) Marseguerra and Zio(1996) 
Zhang et al. (2008) Zhang, Dufour, Dutuit, and Gonzalez . Let g be the cost function to optimize 
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Here, g is a reward function that has to be maximized. The optimization problem to solve is the 
following 

v{z) = sup E z [g(X T )\ . 

reM f 

The function v is called the value function of the problem and represents the maximum performance 
that can be achieved. Solving the optimal stopping problem is firstly to calculate the value function, 
and secondly to find a stopping time r that achieves this maximum. This stopping time is important 
from the application point of view since it corresponds to the optimum time for maintenance. In 
general, such an optimal stopping time does not exist. Define then e-optimal stopping times as 
achieving optimal value minus e, i.e. v(z) — e. 

Under fairly weak regularity conditions, Gugerli has shown in Gugerli(1986)| that the value 



function v can be calculated iteratively as follows. First, choose the computational horizon N such 
that after N jumps, the running time t has reached Tf for almost all trajectories. Let vm = g be 
the reward function, and iterate an operator L backwards. The function v thus obtained is equal 
to the value function v. 

vn = g, 

v k = L(v k+1 ,g), 0<k<N-l. 

Operator L is complex and involves a continuous maximization, conditional expectations and 
indicator functions, even if the reward function g is very regular: 

L(w,g)(z) 

= sup {E [w(Zi)l {Sl<uAt , (z)} + g($(z,u))l {Sl > uAt . {z)} \Z = z]} 

u<t'(z) 

VE[w(Z 1 )\Z = z]. 

However, this operator depends only on the discrete time Markov chain (Z n ,S n ). Gugerli also 
proposes an iterative construction of e-optimal stopping times, which is too technical to be described 
here, see Gugerli(1986)| for details. 



In our example, the reward function has two components g(h,9,t) = f(h,9)t a . The first one 
/ depends on the liquid level and temperature, and reflects that the reward is maximal (set to 
1) when h and 9 are in the normal range (6 m < h < 8 m, 9 < 50° C), minimal (set to 0) when 
reaching the top events: dry out (h < 4 m), overflow (h > 10 m) or hot temperature (9 > 100° C) 
and continuous in between, see Figure [5] for an illustration. The second term t involves the time 
and reflects that the longer the system is functioning the higher the reward. The parameter a is 
set to 1.01 for smoothness. 

3.2 Numerical procedure 



In de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez de Saporta et al.(2012)de Saporta, Dufour, Zhang, a 



the authors propose a numerical method to approximate the value function for the optimal stop- 
ping problem of a general PDMP. The approach is based on a discretization of the operator L 
defined above. Our algorithm for calculating the value function is divided into three stages: a dis- 
cretization of the Markov chain (Z n , S n ), a path-adapted time discretization between jumps, and 
finally a recursive computation of the value function v. Then, the calculation of a quasi-optimal 
stopping time only uses comparisons of quantities already calculated in the approximation of the 
value function, which makes this technique particularly attractive. These stages are briefly recalled 
below. 

3.2.1 Quantization 

The goal of the first step is to approximate the continuous state space Markov chain (Z n ,S n ) by a 
discrete state space sequence (Z n , S n ). To this aim, we use the quantization algorithm described in 
details in e.g. |Pages(19 98) , Pag es and Pham(2005)| Pages et al.(2004a)Pages, Pham, and Printems 



Pages et al.( 2004b) Pages, Pham, and Printems]. Roughly speaking, more points are put in the ar- 



eas of high density of the random variable, see Figure [6] for an example of quantization grid for the 
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Figure 5: Reward function / as a function of h and 9 



standard normal distribution in two dimensions. The quantization algorithm is based on Monte 
Carlo simulations combined with a stochastic gradient method. It provides N+l grids, one for each 
couple [Z n , S n ) (0 < n < N), with a fixed number of points in each grid. The algorithm also pro- 
vides weights for the grid points and probability transition between two points of two consecutive 
grids fully determining the distribution of the approximating sequence (Z n , S n ). The quantization 
theory ensures that the L 2 distance between (Z n ,S n ) and (Z n ,S n ) tends to as the number of 



points in the quantization grids tends to infinity, see Pages et al.(2004a)Pages, Pham, and Printems 



3.2.2 Time discretization 

Now the continuous maximization of the operator L is replaced by a finite maximization, that 
is to say that one must discretize the time intervals [0,i*(z)] for a finite number of z, namely 
each z in the quantization grids. For this, choose time steps A(z) < t*(z) and take a reg- 
ular discretization G(z) of [0,t*(z) — A(z)] with step A(z). The maximum in such grids is 
less than t*(z) — A(z), which is a crucial property to derive error bounds for the algorithm, 



see de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez], 



3.2.3 Approximate calculation of the value function 

One now has all the tools to provide an approximation of the operator L. For each 1 < n < N, 
and for all z in the quantization grid at time n — 1, set 



L n (w,g)(z) 

[w(Z„_ 1 )l { ^ <itAt » W} + 5 ($(Z n _i,u))l { g n > uAt , (z)} |Z n _ 1 = z 
w(Z n )\Z n -i = z 



— max < E 

u<G(z) I 



VE 

Note that because there are different quantized approximations at each time step, there also are 
different discretizations of operator L at each time step. It should be also noted that the conditional 
expectations taken with respect to a process with finite state space are actually finite weighted 
sums. One then constructs an approximation of the value function by backward iterations of the 
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(a) Standard normal density in 2D 



(b) Quantization grid 



Figure 6: Example of quantization grid for a normal distribution (200 points) 



operators L n : 

vn = 9, 



v, 



-ifa-i) = L n (v n ,g)(Z n -i), l<n<N. 



Then take vq(Zq) = vq(z) as an approximation of the value function v at the starting point z 
of the PDMP. The difference between vq(z) and v goes to zero as the number of points in the 



quantization grids goes to infinity, see de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez 
for details and a convergence rate. 



3.2.4 Calculation of a quasi-optimal stopping time 

A method to compute an e-optimal stopping time has also been implemented. The discretization 
is much more complicated and subtle than that of operator L, because one needs both to use the 
true Markov chain (Z ni S n ) and its quantized version (Z n ,S n ). The principle is as follows: 

• At time 0, with the values Zq = z and So = 0, calculate a first date R\ which depends on 
Zq, Sq and on the value that has realized the maximum in the calculation of Li(v\,g). 

• Then the process is allowed to run normally until the time min{i?i, T\}. If R± comes first, it 
is the date of maintenance, if T\ (the date of the first failure) comes first, the calculation is 
reset. 

• At time T\ , with the values of Z\ and S\ , calculate the second date i?2 which depends on Z\ 
and 5*i and on the the value that has realized the maximum in the calculation of £2(^2, <?)■ 

• Then the process is allowed to run normally until the time min{(Ti -\- R2) . If T\ + R2 
comes first, it is the date of maintenance, if T2 comes first, reset the calculation, and so on 
until the iVth jump time or a total running time of 1000 h, whichever comes first, where 
maintenance will be performed if it has not occurred before. 

The quality of this approximation has been proved by comparing the expectation of the cost 
function of the process stopped by the above strategy to the true value function. This re- 
sult, its proof and the precise construction of our stopping time procedure can be found in 
de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez . 

This stopping strategy is interesting for several reasons. First, this is a real stopping time for 
the original PDMP which is a very strong theoretical result. Second, it requires no additional 
computation compared to those made to approximate the value function. This procedure can be 
easily performed in real time. Moreover, even if the original problem is an optimization on average, 
this stopping rule is path-wise and is updated when new data arrive on the history of the process 
at each new component failure. 
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to n=19 


18 


18 


n=26 


18 






Table 2: Theoretical and observed number of modes at each time step for 3 • 10 9 Monte Carlo 
simulations. 



4 Numerical results 

The numerical procedure described above has been implemented on the example of the heated 



holdup tank. We used the exact C++ simulator of trajectories developed for Zhang et al. (2008) Zhang, Dufour, Dut 
suitably modified to take into account the possible failures of the command and interfaced with a 
matlab code for the optimization procedure. The jump horizon N was empirically set to 26 jumps, 
thus allowing all the trajectories to reach one of the top events h < 4 m, h > 10 to, 9 > 100°C or 
t > 1000 hours. 



4.1 Quantization grids 

We encountered a new difficult in when deriving the quantization grids, due to the high cardinality 
of the possible modes and possibly low probably of some of them. Our mathematical model for the 
dynamics of the tank is hybrid: there is a discrete mode variable (the positions of the components 
and state of the control unit) and a continuous variable (liquid level, temperature, running time). 
Of course, one needs not discretize the mode variable as it can already take only finitely many 
values. Our procedure requires one discretization grid at each jump time of the process. However, 
at a given jump time, several modes can appear. For instance, at time 0, the starting mode is 
(ON, OFF, ON, 1). After the first jump time, one of the components has failed, so there are now 
6 possible modes: (StuckON ,OFF,ON ,1), (stuck OFF,OFF,ON,l), (ON, stuck ON, ON, I), 
(ON, stuck OFF,ON,l), (ON , OFF, stuck ON,l) or (ON, OFF, stuck OFF,l). Table [| gives 
the theoretical number of possible modes at each time step as well as the observed one for 3 • 10 9 
simulated trajectories. After 5 jumps, the theoretical number of modes is constant equal to 18, but 
all the 18 modes can actually be observed only as long as the controller unit does not fail. As the 
probability for the controller to remain in its operational state decreases with the number of trials 
of the control laws, the 18 modes become increasingly rare and by the 25-th jump time are not 
observed anymore, which means that the system has reached one of the top events and stopped. 

The comparatively rare events are problematic for the implementation of the quantization 
algorithm. Indeed, one first chooses the number k of discretization points then usually initializes 
the algorithm by throwing k trajectories of the process at random. Thus, some rare modes may not 
be reached by the initial simulations, and the algorithm will not perform well when new trajectories 
are thrown reaching these modes. Indeed, the algorithm is based on a nearest neighbor search, 
within the nodes having the same mode as the original point. When no such mode is present, the 
algorithm provides highly unsatisfactory results. Therefore, we had to find a way to ensure that 
the initializing grids have at least one point in each observed mode for each time step. To do so, 
at each time step we allocated the k points to the possible modes proportionally to their frequency 
(computed with 3 • 10 9 Monte Carlo simulations) and forcing 1 point for the modes with frequency 
less than 1/fc. 

4.2 Optimal performance and maintenance date 

We ran our optimization procedure for several number of discretization points in the quantization 
grids. The results we obtained are given in Table|3] The optimal performance is our approximation 
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200 


334.34 


305.55 


300 


333.04 


319.45 


400 


332.95 


322.20 


800 


330.43 


323.63 


1000 


330.87 


324.04 



Table 3: Optimal performance 



of the value function v at the starting equilibrium point whereas the last column gives the mean 
performance achieved by our stopping rule (obtained by 10 5 Monte Carlo simulations). One can 
observe the convergence as the number of points in the quantization grids increases, and one can 
also see that our stopping rule is indeed close enough to optimality. 

We can also obtain the distributions of the maintenance time by Monte Carlo simulations (10 5 
simulations) . It is given in Figure [7j The distribution is roughly bimodal, with a very high mode 
(14.16% of trajectories) at time 1000, which means that the tank remained in an acceptable state 
until the end of the experiment. For better readability of the histogram, only the value of r* < 1000 
are plotted on the right-hand side figure. This distribution illustrates that an average stopping rule 




(a) distribution (b) zoom of the distribution 



Figure 7: Histogram of the distribution of the computed maintenance time (with and without the 
mode at 1000) 



would be far from optimal for the tank. The distribution of the liquid level and temperature at 
the maintenance time are given on Figure [8j The distribution of the liquid level is almost discrete 



(a) Liquid level (b) Temperature 

Figure 8: Histogram of the liquid level (left) and temperature (right) at the maintenance time 

with three main values at level 6 m (15.16%), 7 m (12.68%) and 8 m (45.45%). This is natural as 
the liquid level is often constant, and moves very fast between these 3 states. The distribution of 
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the temperatures is also almost discrete as it is a function of the liquid level for most modes. It has 
a maximum at the equilibrium temperature (15,60%). Note that he top events are never reached 
for the liquid level, and only 0.02% of trajectories ended at 6 — 100°C. This is a strong point in 
favor of our procedure. 

4.3 Validation 

There is no analytical solution to our optimization problem, therefore it is impossible to compare 
our results with the true value function. However, it must be stressed out that there is a theoretical 



proof in de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez] that this procedure converges 



to the true value as the number of points in the quantization grids goes to infinity. 

We can also conduct two simple kinds of experiments to validate our results. First, we can 
simulate new trajectories and check one by one if the intervention took place at a reasonable time. 
Figure [9] shows such an interesting example. 




Figure 9: An example of trajectory. The upper chart shows the liquid level, the middle chart the 
temperature, and the bottom chart the corresponding reward, as a function of time. The small 
circles represent the jumps of regime, and the large circle and red line the computed maintenance 
time. 



• The system starts in the equilibrium state, so that the reward function grows roughly linearly 
with time. 

• At time 12.94 h, a jump occurs. The first unit is stuck OFF. The liquid level drops fast 
and the temperature rises slowly. As the liquid level and temperature are still within the 
acceptable bounds, the reward function is still growing roughly linearly with time. 

• At time 13.61 h, the liquid level reaches the boundary 6 m and the controller switches the 3 
units to stuck OFF, ON and OFF. The liquid level now rises whereas the temperature drops. 

• At time 14.94 h, the liquid level reaches the boundary 8 m and the controller switches the 
3 units back to stuck OFF, OFF and ON. The liquid level drops again and the temperature 
rises. 

• At time 16.27 h, the liquid level reaches again the boundary 6 m and the controller switches 
the 3 units back to stuck OFF, ON and OFF. The liquid level rises and the temperature 
drops. 



11 



• At time 17.38 h, before the liquid level reaches again 8 m, the second unit fails and is now 
stuck ON. The liquid level still rises and the temperature drops. 



• At time 17.60 h, the liquid level reaches the boundary 8 m and the controller switches the 
3 units to stuck OFF, Stuck ON and ON. The liquid level now remains constant as the 
temperature drops down to its equilibrium state. 

• At time 150.24 h, the third unit is stuck OFF. The liquid level rises again rapidly and goes 
above the acceptable threshold. The temperature remains constant. The algorithm decides 
to perform a maintenance at time 151.58. The final reward is 99.07. 

Another example is given in Figure [T0| 




Figure 10: An example of trajectory. The upper chart shows the fluid level, the middle chart the 
temperature, and the bottom chart the corresponding reward, as a function of time. The small 
circles represent the jumps of regime, and the large circle and red line the computed maintenance 
time. 



• The system starts in the equilibrium state, so that the reward function grows linearly with 
time. 

• At time 1.71 h, a jump occurs. The third unit is stuck OFF. The liquid level rises and 
the temperature remains constant. As the liquid level and temperature are still within the 
acceptable bounds, the reward function is still growing linearly with time. 

• At time 2.37 h, the liquid level reaches the threshold 8m and triggers the command. The 
solicitation of the command is successful, and the first unit is turned OFF. The current state 
is now OFF for units 1 and 2, and stuck OFF for unit 3. The liquid level remains constant 
at 8 to, but the temperature rises. At about 9 h, the temperature crosses the threshold 50° C 
so that the reward function is now slowly decreasing. Note that this does not trigger an 
immediate maintenance. 

• At time 18.22 h, another jump occurs and the second unit is now stuck ON, causing the 
liquid level to rise again, but the temperature to decrease. As a result, the reward function 
is now increasing again. 

• The algorithm then selects the maintenance time to be 18.8 h, before the liquid level reaches 
the level 10 to causing a total failure of the system. 
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without maintenance 


with maintenance 


mean performance 


211.80 


330.87 


null gain 


80.33% 


0.02% 


6 < ft < 8 


28.25% 


90.02% 


6 < 50°C 


80.33% 


95.09% 


ft = 4 


16.65% 


0% 


ft = 10 


54.55% 


0% 


6> = 100 


9.13% 


0.02% 



Table 4: Comparison of the performances with and without maintenance 



Note that at this intervention time, the first unit is not stuck, so that other jumps may happen in 
the future. In this special example, the intervention time occurs when the reward is maximal. 

Second, one can simply compare the performances of the system when no maintenance is 
performed to those with our maintenance rule. The results are summar ized in Table g] These 
results are obtained with 10 5 Monte Carlo simulations and illustrate the power of our procedure 
as the mean performance is increased by 156% and the top events are almost always avoided. 



5 Conclusion 



The numerical method described in de Saporta et al.(2010)de Saporta, Dufour, and Gonzalez has 



been applied to a well known test case of dynamic reliability to approximate the value function 
of the optimal stopping problem and an e-optimal stopping time for a piecewise-deterministic 
Markov process, that is the maintenance date for the tank. The quantization method proposed 
can sometimes be costly in computing time, but has a very interesting property: it can be calculated 
off-line. Moreover it depends only on the dynamics of the model, and not on the cost function 
chosen, or the actual trajectory of the specific process one wants to monitor. The calculation of 
the optimal maintenance time is done in real time, making our procedure applicable in practice. 
The optimal maintenance time is updated at the changes of mode and has a conditional threshold 
form, which allows scheduling maintenance services in advance. 

The method has been implemented on the heated hold-up tank. The main characteristic of this 
system is that it can be modeled by a stochastic hybrid process, where the discrete and continuous 
parts interact in a closed loop. The optimization problem under study has no analytic solution. 
However, our method is based on a rigorous mathematical contraction with proof of convergence. 
In addition, simple comparisons between no motoring and our policy also prove its practical validity 
with a significant improvement of the performance of the system. 

Our next project is to extend this research in two main directions. First, we could allow only 
partial repair of the system. The problem will then be to find simultaneously the optimal times 
of maintenance and optimal repair levels. Mathematically, it is an impulse control problem, which 
complexity exceeds widely that of the optimal stopping. Second, our method requires a perfect 
observation of the state process at the jump times. It would be interesting to extend our results 
to a noisy observation of the process, as often happens in real life. 
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